Formal Concept Analysis Constrained by Attribute-Dependency Formulas
نویسندگان
چکیده
An important topic in formal concept analysis is to cope with a possibly large number of formal concepts extracted from formal context (input data). We propose a method to reduce the number of extracted formal concepts by means of constraints expressed by particular formulas (attribute-dependency formulas, ADF). ADF represent a form of dependencies specified by a user expressing relative importance of attributes. ADF are considered as additional input accompanying the formal context 〈X,Y, I〉. The reduction consists in considering formal concepts which are compatible with a given set of ADF and leaving out noncompatible concepts. We present basic properties related to ADF, an algorithm for generating the reduced set of formal concepts, and demonstrating examples. 1 Preliminaries and Problem Setting We refer to [6] (see also [14]) for background information in formal concept analysis (FCA). We denote a formal context by 〈X,Y, I〉, i.e. I ⊆ X×Y (objectattribute data table, objects x ∈ X, attributes y ∈ Y ); the concept deriving operators by ↑ and ↓, i.e. for A ⊆ X, A↑ = {y ∈ Y | for each x ∈ A : 〈x, y〉 ∈ I} and dually for ↓; a concept lattice of 〈X,Y, I〉 by B (X,Y, I), i.e. B (X,Y, I) = {〈A,B〉 ∈ 2 × 2 | A↑ = B, B↓ = A}. An important aspect of FCA is a possibly large number of formal concepts in B (X,Y, I). Very often, the formal concepts contain those which are in a sense not interesting for the expert. In this paper, we present a way to naturally reduce the number of formal concepts extracted from data by taking into account information additionally supplied to the input data table (formal context). We consider a particular form of the additional information, namely, a form of particular attribute dependencies expressed by (logical) formulas that can be supplied by an expert/user. The primary interpretation of the dependencies is to express a kind of relative importance of attributes. We introduce the notion of a formal concept compatible with the attribute dependencies. The main gain of considering only compatible formal concepts and disregarding formal concepts which are not compatible is the reduction of the number of resulting formal concepts. This leads to a more comprehensible structure of formal concepts (clusters) extracted from B. Ganter and R. Godin (Eds.): ICFCA 2005, LNCS 3403, pp. 176–191, 2005. c © Springer-Verlag Berlin Heidelberg 2005 Formal Concept Analysis Constrained by ADF 177 the input data. We present basic theoretical results, an algorithm for generating compatible formal concepts, and illustrate our approach by examples. 2 Constraints by Attribute Dependencies 2.1 Motivation When people categorize objects by means of their attributes, they naturally take into account the importance of attributes. Usually, attributes which are less important are not used to form large categories (clusters, concepts). Rather, less important attributes are used to make a finer categorization within a larger category. For instance, consider a collection of certain products offered on a market, e.g. home appliances. When categorizing home appliances, one may consider several attributes like price, the purpose of the appliance, the intended placement of the appliance (kitchen appliance, bathroom appliance, office appliance, etc.), power consumption, color, etc. Intuitively, when forming appliance categories, one picks the most important attributes and forms the general categories like “kitchen appliances”, “office appliances”, etc. Then, one may use the less important attributes (like “price ≤ $10”, “price between $15–$40”, “price > $100”, etc.) and form categories like “kitchen appliance with price between $15–$40”. Within this category, one may further form finer categories distinguished by color. This pattern of forming categories follows the rule that when an attribute y is to belong to a category, the category must contain an attribute which determines a more important characteristic of the attribute (like “kitchen appliance” determines the intended placement of the appliance). This must be true for all the characteristics that are more important than y. In this sense, the category “red appliance” is not well-formed since color is considered less important than price and the category “red appliance” does not contain any information about the price. Which attributes and characteristics are considered more important depends on the particular purpose of categorization. In the above example, it may well be the case that price be considered more important than the intended placement. Therefore, the information about the relative importance of the attributes is to be supplied by an expert (the person who determines the purpose of the categorization). Once the information has been supplied, it serves as a constraint for the formation of categories. In what follows, we propose a formal approach to the treatment of the above-described constraints to formation of categories. 2.2 Constraints by Attribute-Dependency Formulas Consider a formal context 〈X,Y, I〉. We consider constraints expressed by formulas of the form y y1 · · · yn. (1) Formulas (1) will be called AD-formulas (attribute-dependency formulas). The set of all AD-formulas will be denoted by ADF . Let now C ⊆ ADF be a set of AD-formulas. 178 R. Bělohlávek and V. Sklenář Definition 1. A formal concept 〈A,B〉 satisfies an AD-formula (1) if we have that if y ∈ B then y1 ∈ B or · · · or yn ∈ B. Remark 1. More generally, we could consider formulas l(y) l(y1) · · · l(yn) where l(z) is either z or z. For instance, y y1 would be satisfied by 〈A,B〉 if whenever y ∈ B then none of x ∈ A has y1. For the purpose of our paper, however, we consider only (1). The fact that 〈A,B〉 ∈ B (X,Y, I) satisfies an AD-formula φ is denoted by 〈A,B〉 |= φ. Therefore, |= is the basic satisfaction relation (being a model) between the set B (X,Y, I) of all formal concepts (models, structures) and the set ADF of all AD-formulas (formulas). As usual, |= induces two mappings, Mod : 2ADF → 2B(X,Y,I) assigning a subset Mod(C) = {〈A,B〉 ∈ B (X,Y, I) | 〈A,B〉 |= φ for each φ ∈ C} to a set C ⊆ ADF of AD-formulas, and Fml : 2B(X,Y,I) → 2ADF assigning a subset Fml(U) = {φ ∈ ADF | 〈A,B〉 |= φ for each 〈A,B〉 ∈ U} to a subset U ⊆ B (X,Y, I). The following result is immediate [12]. Theorem 1. The mappings Mod and Fml form a Galois connection between ADF and B (X,Y, I). That is, we have C1 ⊆ C2 implies Mod(C2) ⊆ Mod(C1), (2) C ⊆ Fml(Mod(C)), (3) U1 ⊆ U2 implies Fml(U2) ⊆ Fml(U1), (4) U ⊆ Mod(Fml(U)). (5) for any C, C1, C2 ⊆ ADF, and U,U1, U2 ⊆ B (X,Y, I). Definition 2. For C ⊆ ADF we put BC (X,Y, I) = Mod(C) and call it the constrained (by C) concept lattice induced by 〈X,Y, I〉 and C. For simplicity, we also denote BC (X,Y, I) by BC . That is, BC (X,Y, I) is the collection of all formal concepts from B (X,Y, I) which satisfy each AD-formula from C (satisfy all constraints from C). The following is immediate. Theorem 2. BC (X,Y, I) is a partially ordered subset of B (X,Y, I) which is bounded from below. Moreover, if C does not contain an AD-formula (1) such that y is shared by all objects from X and none of y1, . . . , yn is shared by all objects, then BC (X,Y, I) is bounded from above. Formal Concept Analysis Constrained by ADF 179 Proof. Obviously, 〈Y ↓, Y 〉 is the least formal concept from B (X,Y, I) and it is compatible with each AD-formula. Therefore, 〈Y ↓, Y 〉 bounds BC (X,Y, I) from below. Furthermore, if there is no AD-formula (1) with the above-mentioned properties then 〈X,X↑〉 is the upper bound of BC (X,Y, I) since in this case 〈X,X↑〉 clearly satisfies C. Remark 2. Note that the condition guaranteeing that BC (X,Y, I) is bounded from above is usually satisfied. Namely, in most cases, there is no object satisfying all attributes and so X↑ = ∅ in which case the condition is fulfilled. Let us now consider AD-formulas of the form y y′. (6) Clearly, (6) is a particular form of (1) for n = 1. Constraints equivalent to (6) were considered in [1, 2], see also [9] for a somewhat different perspective. In [1, 2], constraints are considered in the form of a binary relation R on a set of attributes (in [1]) or objects (in [2]). On the attributes, R might be a partial order expressing importance of attributes; on the objects, R might be an equivalence relation expressing some partition of objects. Restricting ourselves to AD-formulas (6), BC (X,Y, I) is itself a complete lattice: Theorem 3. Let C be a set of AD-formulas of the form (6). Then BC (X,Y, I) is a complete lattice which is a ∨ -sublattice of B (X,Y, I). Proof. Since BC (X,Y, I) is bounded from below (Theorem 2), it suffices to show that BC (X,Y, I) is closed under suprema in B (X,Y, I), i.e. that for 〈Aj , Bj〉 ∈ BC (X,Y, I) we have 〈(∩jBj)↓,∩jBj〉 ∈ BC (X,Y, I). This can be directly verified. One can show that BC (X,Y, I) in Theorem 3 need not be a ∧ -sublattice of B (X,Y, I). Note that 〈A,B〉 |= (y y′) says that B contains y′ whenever it contains y. Then, 〈A,B〉 |= {y y′, y′ y} if either both y and y′ are in B or none of y and y′ is in B. This seems to be interesting particularly in the dual case, i.e. when considering constraints on objects, to select only formal concepts which do not separate certain groups of objects (for instance, the groups may form a partition known from outside or generated from the formal context). In the rest of this section we briefly discuss selected topics related to constraints by AD-formulas. Due to the limited space, we omit details. 2.3 Expressive Power of AD-Formulas An attribute may occur on left hand-side of several AD-formulas of C. For example, we may have y y1 y2 and y y3 y4. Then, for a formal concept 〈A,B〉 to be compatible, it has to satisfy the following: whenever y ∈ B then it must be the case that y1 ∈ B or y2 ∈ B, and y3 ∈ B or y4 ∈ B. Therefore, it is tempting to allow for expressions of the form
منابع مشابه
Fuzzy attribute logic over complete residuated lattices
We present a logic, called fuzzy attribute logic, for reasoning about formulas describing particular attribute dependencies. The formulas are of a form A ⇒ B where A and B are collections of attributes. Our formulas can be interpreted in two ways. First, in data tables with entries containing degrees to which objects (table rows) have attributes (table columns). Second, in database tables where...
متن کاملAssociation Mining and Formal Concept Analysis
In this paper, we develop a connection between association queries and formal concept analysis. An association query discovers dependencies among values of an attribute grouped by other, non-primary attributes in a given relation. Formal concept analysis deals with formal mathematical tools and techniques to develop and analyze relationship between concepts and to develop concept structures. We...
متن کاملExploring Users' Preferences in a Fuzzy Setting
We propose a new method for modelling users’ preferences on attributes that contain more than one trait. Starting with a data set the users have to enter a sort of order on the attributes in form of formulas corresponding to their preferences. Based on this order they only receive the relevant formal concepts, i.e., “object-attribute clusters”, where relevant corresponds to the users’ point of ...
متن کاملConcept Lattices Constrained by Attribute Dependencies
The input data to formal concept analysis consist of a collection of objects, a collection of attributes, and a table describing a relationship between objects and attributes (so-called formal context). Very often, there is an additional information about the objects and/or attributes available. In the analysis of the data, the additional information should be taken into account. We consider a ...
متن کاملAttribute Dependencies in a Fuzzy Setting
We present a new framework for modelling users preferences in a fuzzy setting. Starting with a formal fuzzy context, the user enters so-called attribute dependency formulas based on his priorities. The method then yields the “interesting” formal concepts, that is, interesting from the point of view of the user. Our approach is designed for compounded attributes, i.e., attributes which include m...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005